Automatic extraction of bilingual chunk lexicon for spoken language translation

نویسندگان

  • Limin Du
  • Boxing Chen
چکیده

In language communication, an utterance may be segmented as a concatenation of chunks that are reasonable in syntax, meaningful in semantics, and composed of several words. Usually, the order of words within chunks is fixed, and the order of chunks within an utterance is rather flexible. The improvement of spoken language translation could benefit from using bilingual chunks. This paper presents a statistical algorithm to build the bilingual chunk-lexicon automatically from spoken language corpora. Several association measurements are set up as the criteria of the extraction. And local best algorithm, length ratio filtration and stop-word filtration are also incorporated to improve the performance. A bilingual chunk-lexicon was extracted from a corpus with precision of 86.0% and recall of 86.7%. The usability of the chunk-lexicon was then tested with an innovative framework for English-to-Chinese Spoken Language translation, resulted in translation accuracy of 81.83% and 78.69% for training and test sets respectively, measured with Levenshtein distance based similarity score.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Comparative Study on Translation Units for Bilingual Lexicon Extraction

This paper presents on-going research on automatic extraction of bilingual lexicon from English-Japanese parallel corpora. The main objective of this paper is to examine various Ngram models of generating translation units for bilingual lexicon extraction. Three N-gram models, a baseline model (Bound-length N-gram) and two new models (Chunk-bound Ngram and Dependency-linked N-gram) are compared...

متن کامل

Bengali and Hindi to English Cross-language Text Retrieval under Limited Resources

This paper describes our experiment on two cross-lingual and one monolingual English text retrievals at CLEF in the ad-hoc track. The cross-language task includes the retrieval of English documents in response to queries in two most widely spoken Indian languages, Hindi and Bengali. For our experiment, we had access to a HindiEnglish bilingual lexicon, ’Shabdanjali’, consisting of approx. 26K H...

متن کامل

Bilingual LSA-based translation lexicon adaptation for spoken language translation

We present a bilingual LSA (bLSA) framework for translation lexicon adaptation. The idea is to apply marginal adaptation on a translation lexicon so that the lexicon marginals match to indomain marginals. In the framework of speech translation, the bLSA method transfers topic distributions from the source to the target side, such that the translation lexicon can be adapted before translation ba...

متن کامل

Extraction de lexiques bilingues à partir de Wikipédia (Bilingual lexicon extraction from Wikipedia) [in French]

________________________________________________________________________________________________________ Bilingual lexicon extraction from Wikipedia With the increased interest of the machine translation, needs of multilingual resources such as comparable corpora and bilingual lexicon has increased. These resources are not available mainly for pair of languages that do not involve English. This...

متن کامل

Large - Scale Automatic Extraction of anEnglish - Chinese Translation

We report experimental results on automatic extraction of an English-Chinese translation lexicon, by statistical analysis of a large parallel corpus, using limited amounts of linguistic knowledge. To our knowledge, these are the rst empirical results of the kind between an Indo-Europeanand non-Indo-Europeanlanguage for any signiicantvocabulary and corpus size. The learned vocabulary size is abo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003